home *** CD-ROM | disk | FTP | other *** search
- The History and Philosophy of Project Gutenberg (c)August 1992
-
- Second edition prepared for August, 1992. Updated regularly.
- (margins are 62, about 10 pages, send only the complete file.)
- (Includes answers to many Frequently Asked Questions (FAQ))
-
- There is a lot of information in this little file. . .and your
- requested information may be contained in a short portion. It
- is therefore recommended that you search for subjects. It was
- not feasible to break this file into smaller ones, but we have
- been told that our audience responds best to quick, short, and
- concise responses. These are marked by subject headers and by
- paragraphing. Read fast, it is all quite simple. If you find
- something of great interest, you might want to read it again.
-
- The purpose of this file is to answer questions. . .not create
- flames. We have long ago learned that flamers must be allowed
- to burn themselves out. However, we feel obliged to answer in
- the forums in which the flames were posted. . .not to satisfy,
- can't be done, the flamers, but to explain to the rest of that
- audience what Project Gutenberg is and is not, however flamers
- may have misstated the obvious. Etext is certainly one of the
- most obvious uses of computers, and the flamers can hardly put
- a dent in that fact. Plain Vanilla ASCII is also obviously an
- important etext medium, but no one at Project Gutenberg states
- that it is or should be the only etext medium.
-
- "When you get something for free, you get what you pay for!!!"
- That means if you don't use what you get for free, it won't do
- you any good. But sometimes it is nice to have a library your
- friends and family can use, even if they don't always use it.
-
-
- The Beginning
-
- Project Gutenberg began in 1971 when Michael Hart was given an
- operator's account with $100,000,000 of computer time in it by
- the operator's of the Xerox Sigma V mainframe at the Materials
- Research Lab at the University of Illinois.
-
- This was totally serendipitous, as it turned out that two of a
- four operator crew happened to be the best friend of Michael's
- and the best friend of his brother. Michael just happened "to
- be at the right place at the right time" at the time there was
- more computer time than people knew what to do with, and those
- operators were encouraged to do whatever they wanted with that
- fortune in "spare time" in the hopes they would learn more for
- their job proficiency.
-
- At any rate, Michael decided there was nothing he could do, in
- the way of "normal computing," that would repay the huge value
- of the computer time he had been given. . .so he had to create
- $100,000,000 worth of value in some other manner. An hour and
- 47 minutes later, he announced that the greatest value created
- by computers would not be computing, but would be the storage,
- retrieval, and searching of what was stored in our libraries.
-
- He then proceeded to type in the "Declaration of Independence"
- and tried to send it to everyone on the networks. . .which can
- only be described today as a not so narrow miss at creating an
- early version of what was later called the "Internet Virus."
-
- A friendly dissuasion from this yielded the first posting of a
- document in electronic text, and Project Gutenberg was born as
- Michael stated that he had "earned" the $100,000,000 because a
- copy of the Declaration of Independence would eventually be an
- electronic fixture in the computer libraries of 100,000,000 of
- the computer users of the future.
-
- The Beginning of the Project Gutenberg Philosophy
-
- The premise on which Michael Hart based Project Gutenberg was:
- anything that can be entered into a computer can be reproduced
- indefinitely. . .what Michael termed "Replicator Technology"
- The concept of Replicator Technology is simple; once a book or
- any other item (including pictures, sounds, and even 3-D items
- can be stored in a computer, then any number of copies can and
- will be available. Everyone in the world, or even not in this
- world (given satellite transmission) can have a copy of a book
- that has been entered into a computer.
-
- This philosophical premise has created several offshoots:
-
- 1. Electronic Texts (Etexts) created by Project Gutenberg are
- to be made available in the simplest, easiest to use forms
- available.
-
- 2. Suggestions to make them less readily available are not to
- be treated lightly.
-
- Therefore, Project Gutenberg Etexts are made available in what
- has become known as "Plain Vanilla ASCII," meaning the low set
- of the American Standard Code for Information Interchange: ie
- the same kind of character you read on a normal printed page--
- italics, underlines, and bolds have been capitalized.
-
- *** Parenthetical discussion on bold, italics and underlines)
- This next paragraph may be skipped if you wish; it was created
- in response to severe flaming on several occasions. (In many
- conversations with authors, and those who research the authors
- whom we publish, we have determined that most selections of an
- assortment of possible emphases were made by the editors, with
- little or no consultation to the authors. Thus we have little
- motivation to continue our previous efforts to determine a way
- to present italics, bolds and underlines in any other way than
- by capitalizing them. In our estimation, the authors are this
- final authority, and they say they merely intend to emphasize,
- not that they have a particular affinity for one form over the
- others. Please remember, we only talked to many authors, most
- of whom said they either had no affinity for particular method
- selections for emphasis (i.e. they didn't really care how most
- emphases were made. . .via italics, bold, or underline). This
- does NOT mean we talked to ALL authors, or that ALL said this.
- This disclaimer is to mollify the flames we constantly get for
- this. One quite famous author and editor has said that we may
- as well get rid of all the capitals and punctuation, if we are
- not going to do italics, bold and underline. Actually when we
- started Project Gutenberg, there was no case distinction, very
- few punctuation marks, and it was not terribly easy to read an
- original etext of the Declaration of Independence. We try for
- readability by HUMANS in the first place, and by programs as a
- secondary feature. We LIKE the idea that programs should read
- our files easily, BUT NOT TO THE EXCLUSION OF HUMANS. Thus we
- do not use intrusive forms of markup, either those that should
- make it difficult for many humans to read, or those that would
- make it impossible for programs to read and search. Please no
- more flames or requests for markup. This is for others to do,
- and they are welcome to use our etexts in the doing. Repeat:
- Project Gutenberg Etexts are meant for the general population,
- NOT for the top 1% of the population who argue about whether a
- word was meant to be italicized or bolded or underlined. This
- is especially true of older books, written and published under
- the customs and practices of different times and places. This
- must be considered. So must the fact that many or most of the
- books we are going to do were not written in English, or in an
- English that is from a different place and time than this 20th
- Century American English most networkers tend to use. English
- of that type is not the language hardly any of our etexts were
- written in. The arguments about American versus English are a
- non-sequitur (irrelevant) to most of our audience, and we must
- not spend as much time working on those aspects of a book as a
- whole new book would take us to do. The same is true of 99.9%
- accuracy. We expect to have errors in our etexts. . .etext is
- so easy to correct that people just send us notes with errors;
- we save them, and when we have a dozen we put out a new etext.
- This takes very little time: we are now on our 30th edition of
- Alice in Wonderland. Where else are you going to get editions
- improved on such a rapid basis. In fact, one of the arguments
- we hear frequently is that the errors of various editions must
- be preserved in the etext editions, or the etexts editions are
- not "authoritative editions." Ladies and Gentlemen. . .I have
- just fallen off the head of the pin I have been balancing on--
- (philosophers used to argue [seriously] about how many angels,
- presuming such things as angels, could stand on the head of an
- ordinary pin [some said it was how many could DANCE on it]; at
- any rate this is more than enough for 1992, and I don't intend
- to address the questions in this section for another year.
-
- (End of parenthetical discussion on emphasis. Back to. . . .)
- (When you read the next line you will wish you had skipped)***
-
- The reason for this is that 99% of the hardware and software a
- person is likely to run into can read and search these files.
-
- Any other system of etext storage is going to fall short of an
- audience of 99%.
-
- This does not mean there are not other valid mean of doing the
- etext business. . .after all, over half the computers are DOS,
- so one could address a wide audience by just doing DOS. Plain
- Vanilla ASCII, however, addresses the audience with Apples and
- Ataris all the way to the old homebrew Z80 computers, while an
- audience of Mac, UNIX and mainframers is still included.
-
- In this same vein, Project Gutenberg selects etexts targeted a
- bit on the "bang for the buck" philosophy. . .we choose etexts
- we hope extremely large portions of the audience will want and
- use frequently. We are constantly asked to prepare etext from
- out of print editions of esoteric materials, but this does not
- provide for usage by the audience we have targeted, 99% of the
- general public.
-
- Also in the same vein, Project Gutenberg has avoided requests,
- demands, and pressures to create "authoritative editions." We
- do not write for the reader who cares whether a certain phrase
- in Shakespeare has a ":" or a ";" between its clauses. We put
- our sights on a goal to release etexts that are 99.9% accurate
- in the eyes of the general reader. Given the preferences your
- proofreaders have, and the general lack of reading ability the
- public is currently reported to have, we probably exceed those
- requirements by a significant amount. However, for the person
- who wants an "authoritative edition" we will have to wait some
- time until this becomes more feasible. We do, however, intend
- to release many editions of Shakespeare and the other classics
- for the comparative study on a scholarly level, before the end
- of the year 2001, when we are scheduled to complete our 10,000
- book Project Gutenberg Electronic Public Library.
-
- Project Gutenberg hopes to be a part of massive celebrations a
- 100th Anniversary of Public Libraries deserves in 1995, and in
- 1997 hopes to found "The Public Domain Register," on the 100th
- Anniversary of The U.S. Copyright Register.
-
- We hope you will be part of it, too. You are all invited.
-
- Footnote:
- Our eventual goal is to provide Public Domain Etext editions a
- short time after they enter the Public Domain. Of course, the
- period before a copyrighted work entered the Public Domain was
- extended from 28 years (with a 28 year extension available) to
- 50 years more than the life of the author, so this put a kink,
- to put it mildly, into our plans. (The original copyright was
- for 14 years, in the U.S.) Thus, a person could originally do
- a reasonable prediction that anything under copyright would be
- in the Public Domain while it could be used, under the new law
- it is impossible to predict the length of a copyright, and the
- likelihood of a new book entering the Public Domain during the
- lifetime of the average reader is minimal. (Suppose you might
- be 25 when you read a new book and the author is 50: wait the
- average 25 years for the author to die (what a thought!*) Now
- you have to wait another 50 years to have access to that book;
- it doesn't matter when it was written (unless it is an old one
- . . .before the period the law retroacted to). . .so you would
- have to wait (on the average) until you were 100 years old. A
- 25-year-old under the original law would only have to wait for
- 14 years. . .until the age of 39. Quite a difference; between
- the ages of 39 and 100. Not only that, but the copyright laws
- would have to stay the same for all that time. . .something in
- serious doubt, seeing how much they have changed in the recent
- century.
-
- This goal of presenting Public Domain Editions immediately has
- a Public Domain Register as it predecessor. Before you expect
- the availability of all Public Domain materials, we have to at
- least come up with a way of listing what those titles are. If
- you are interested, please let us know before 1997 so we might
- be able to include your efforts in the Public Domain Register.
-
-
- The Project Gutenberg Philosophy
-
- The Project Gutenberg Philosophy is to make information, books
- and other materials available to the general public in forms a
- vast majority of the computers, programs and people can easily
- read, use, quote, and search.
-
- This has several ramifications:
-
- 1. The Project Gutenberg Etexts should cost so little that no
- one will really care how much they cost. They should be a
- general size that fits on the standard media of the time.
-
- i.e. when we started, the files had to be very small as a
- normal 300 page book too one meg of space, which no one in
- 1971 could be expected to have (in general). So doing the
- U.S. Declaration of Independence (only 5K) seemed the best
- place to start. This was followed by the Bill of Rights--
- then the whole US Constitution, as space was getting large
- (at least by the standards of 1973). Then came the Bible,
- as individual books of the Bible were not that large, then
- Shakespeare (a play at a time), and then into general work
- in the areas of light and heavy literature and references.
-
- The rate at which we have chosen to release etexts is that
- rate which will allow the general public (and us!) to grow
- without undue effort into the Electronic Public Libraries.
- We can't rely on CD's, as only a small fraction of persons
- interested in etexts have CD's. We think CD are great but
- we can't have that as our primary means of measurement and
- distribution. Our goal is for the average user to be able
- to store our library inexpensively on standard media. The
- current standards are magnetic, with 1.44 floppies and the
- 200 and some meg hard drives being sold on the average for
- the average two or three thousand dollar computer. A 1.44
- floppy costs about fifty cents these days, in quantity (50
- or so is enough to get this price), so $25 is enough for a
- person to get into very inexpensive storage. This is just
- about $1 to store uncompressed one thousand page books and
- the average book can be stored on one floppy.
-
- We like to think we have planned well enough that the user
- would always be able to keep our library at an inexpensive
- price. 1.44 floppies are currently the most feasible, for
- the wallet, at least, and hard drive prices are falling to
- nearly the same price per meg level. Right now our etexts
- will fit quite nicely into one partition on the systems in
- the two to three thousand dollar range. By the end of the
- year 2001, we predict that this will still be the case, in
- terms of a much larger library, and much larger computers,
- which should also be much faster. The 786 should be out a
- year or two before that time. The default computer of ten
- years ago had maybe one meg, a few years later it was five
- and then ten, until now it is a couple hundred meg ($1798,
- at most mail order and discount houses. . .our default was
- the "Best Buy" discount house which currently sells:
-
- (And we do NOT recommend Best Buy or their brands)
-
- 486SX/25, 170M drive, 4MRAM, 8K cache, 2400 modem, two
- floppies, SVGA, 24 pin printer, mouse, Windows 3.1 etc
-
- These systems are not the best hardware in the world but a
- system can be returned. Everything is already on the hard
- drive, and all you have to do is turn it on. Floppies for
- both drives are included.
-
- Again, we do not recommend any of these in particular, but
- merely use them as a default measurement. The entire text
- library of Project Gutenberg should fit nicely into these,
- and should be relatively easy to search.
-
- If these trends continue as they have for the past decade,
- then you should see something with gigabytes by 2001, in a
- similar price range.
-
- We try to keep pace with the technology available to users
- in the average ranges. We would like to grow at the rates
- they are growing, so our goal is to double our output each
- year. We are doing two books a month in 1992. We did one
- a month in 1991. We plan on four per month in 1993. This
- should be a relatively easy load for people to acquire.
-
- The total output of Project Gutenberg in 1991 was about 9M
- or maybe 10 if you kept all our notes. For the first half
- of 1992, it was about 10M of files (this includes a Bible,
- so this is a little larger). However, the main point is a
- computer such as the one described above would use only 10
- percent of its space to hold the last 24 books released by
- Project Gutenberg. We estimate each 24 books will take 10
- meg, so the entire year's output is expected to double any
- year (1991=10, 1992=20, 1993=40, 1994=80, 1995=160, etc.)
-
- Of course this will require a drive of over a gigabyte for
- 1995, if our library is to remain in one corner of it. It
- seems highly likely however, that most computers costing 2
- or 3 thousand dollars at that time will have one gigabyte,
- if not more. Our personal caluculations have always based
- on $1500 drives, as that was the cost of our first drives,
- which were 5M (ST-506). Today that $1500 will buy a gig.
-
- By the time Project Gutenberg got famous, the standard was
- 360K disks, so we did books such as Alice in Wonderland or
- Peter Pan because they could fit on one disk. Now 1.44 is
- the standard disk and ZIP is the standard compression; the
- practical filesize is about three million characters, more
- than long enough for the average book. However, we prefer
- not to require users to use compression, at least until it
- become a standard. That is why all our etexts are posted,
- when we have control, in both ASCII and .zip files.
-
- However, pictures are still so bulky to store on disk that
- it will still be a while before we include even the lowres
- Tenniel illustrations in Alice and Looking-Glass. However
- we ARE very interested in doing them, and are only waiting
- for advances in technology to release a test edition. The
- market will have to establish SOME standards for graphics,
- however, before we can attempt to reach general audiences,
- at least on the graphics level.
-
- To illustrate our faith in graphics, and in the future, we
- have gone one step further in our pursuit of what we named
- "Replicator Technology" TM a few years ago. We would like
- the end of this phase of Project Gutenberg (at year's end,
- 2001 with a first 3D application of Replicator Technology,
- by doing CAT, MRI and XRAY Fluoroscopy scans of something,
- perhaps a painting, and printing 3D copies. If anyone can
- get us access to a hundred year old masterpiece. . . .
-
-
- 2. The Project Gutenberg Etexts should so easily used that no
- one should ever have to care about how to use, read, quote
- and search them.
-
- This has created a need to present these Project Gutenberg
- Etexts in "Plain Vanilla ASCII" as we have come to call it
- over the years.
-
- The reason for this is simple. . .it is the only text mode
- that is easy on both the eyes and the computer.
-
- However, this encourages others to improve our etexts in a
- variety of ways and to distribute them in a variety of the
- available media, as follows:
-
- Once an etext is created in Plain Vanilla ASCII, it is the
- foundation for as many editions as anyone could hope to do
- in the future. Anyone desiring an etext edition matching,
- or not matching, a particular paper edition can readily do
- the changes they like without having to prepare that whole
- book again. They can use the Project Gutenberg Etext as a
- foundation, and then build in any direction they like.
-
- Thus any complaints about how we do italics, bold, and the
- underscoring, or whether we should use this or that markup
- formula are sent back with encouragement to do it any ways
- any person wants it, and with the basic work already done,
- with our compliments.
-
- The same goes for media. We have had a long-standing work
- ethic of providing our etexts in any medium people wanted:
- Amiga, Apple, Atari. . .to IBM, to Mac, to TRS-80. . . .
-
- However, now that our etexts are carried in so many BBS's,
- networks and other locations, it is easier to download the
- file in a manner that puts them in your format than we can
- make and mail a disk, so we don't really do that too much.
-
- The major point of all this is that years from now Project
- Gutenberg Etexts are still going to be viable, but program
- after program, and operating system after operating system
- are going to go the way of the dinosaur, as will all those
- pieces of hardware running them. Of course, this is valid
- for all Plain Vanilla ASCII etexts. . .not just those your
- access has allowed you to get from Project Gutenberg. The
- point is that a decade from now we probably won't have the
- same operating systems, or the same programs and therefore
- all the various kinds of etexts that are not Plain Vanilla
- ASCII will be obsolete. We need to have etexts in files a
- Plain Vanilla search/reader program can deal with; this is
- not to say there should never be any markup. . .just those
- forms of markup should be easily convertible into regular,
- Plain Vanilla ASCII files so their utility does not expire
- when programs to use them are no longer with is. Remember
- all the trouble with CONVERT programs to get files changed
- from old word processor programs into Plain Vanilla ASCII?
-
- Do you want to go through all that again with every book a
- whole world ever puts into etext?
-
- The value of Plain Vanilla ASCII is obvious. . .so is very
- much of the value of most of the various markup systems we
- have in the world. But until some real standards arrive--
- we would be limiting our options a great deal if we do not
- keep copies of all etexts in Plain Vanilla ASCII as well.
-
- We don't have anything against markup. Not vice versa.
-
- Alice in Wonderland, the Bible, Shakespeare, the Koran and
- many others will be with us as long as civilization. . .an
- operating system, a program, a markup system. . .will not.
-
- This includes the many requests we have for compression in
- particular formats. There are only two formats we know of
- that are suitable for transfer to a wide general audience:
- Plain Vanilla ASCII (.txt files) and ZIPped files of them,
- (.zip files). Requests for other compression formats must
- be ignored as they are appropriate only for small portions
- of our target audience. However, (programmers take note:
- we will need help) we are planning to put some compression
- links on our files so they can be transmitted in any of an
- assortment compression formats on the fly. i.e. we should
- be able to generate any kind of file asked for, but we can
- keep only one copy of each etext on our servers. . .as the
- .Z compression format does in a similar manner today.
-
-
- 3. The selection of Project Gutenberg Etexts
-
- There are three portions of the Project Gutenberg Library,
- basically be described as:
-
- A. Light Literature; such as Alice in Wonderland, Through
- the Looking-Glass, Peter Pan, Aesop's Fables, etc.
-
- B. Heavy Literature; such as the Bible or other religious
- documents, Shakespeare, Moby Dick, Paradise Lost, etc.
-
- C. References; such as Roget's Thesaurus, almanacs, and a
- set of encyclopedia, dictionaries, etc.
-
- The Light Literature Collection is designed to get persons
- to the computer in the first place, whether the person may
- be a pre-schooler or a great-grandparent. We love it when
- we hear about kids or grandparents taking each other to an
- etexts to Peter Pan when they come back from watching HOOK
- at the movies, or when they read Alice in Wonderland after
- seeing it on TV. We have also been told that nearly every
- Star Trek movie has quoted current Project Gutenberg etext
- releases (from Moby Dick in The Wrath of Kahn; a Peter Pan
- quote finishing up the most recent, etc.) not to mention a
- reference to Through the Looking-Glass in JFK. This was a
- primary concern when we chose the books for our libraries.
-
- We want people to be able to look up quotations they heard
- in conversation, movies, music, other books, easily with a
- library containing all these quotations in an easy to find
- etext format. With Plain Vanilla ASCII you will be easily
- able to search an entire library, without any program more
- sophisticated than a plain search program. In fact, these
- Project Gutenberg Etext files are so plain that you can do
- a search on them without even using an intermediate search
- program (i.e. a program between you and the disk) Norton's
- and other direct disk access programs can search every one
- of your files without you even naming them, pointing to an
- etext directory, or whatever. You can simply search a raw
- output from the disk. . .I do this on a half gigabyte disk
- partition, containing all our editions.
-